C Cross - Validation

نویسندگان

  • PAYAM REFAEILZADEH
  • LEI TANG
  • HUAN LIU
چکیده

Definition Cross-Validation is a statistical method of evaluating and comparing learning algorithms by dividing data into two segments: one used to learn or train a model and the other used to validate the model. In typical cross-validation, the training and validation sets must cross-over in successive rounds such that each data point has a chance of being validated against. The basic form of cross-validation is k-fold cross-validation. Other forms of cross-validation are special cases of k-fold cross-validation or involve repeated rounds of k-fold cross-validation. In k-fold cross-validation the data is first partitioned into k equally (or nearly equally) sized segments or folds. Subsequently k iterations of training and validation are performed such that within each iteration a different fold of the data is held-out for validation while the remaining k 1 folds are used for learning. Fig. 1 demonstrates an example with k = 3. The darker section of the data are used for training while the lighter sections are used for validation. In data mining and machine learning 10-fold cross-validation (k = 10) is the most common. Cross-validation is used to evaluate or compare learning algorithms as follows: in each iteration, one or more learning algorithms use k 1 folds of data to learn one or more models, and subsequently the learned models are asked to make predictions about the data in the validation fold. The performance of each learning algorithm on each fold can be tracked using some predetermined performance metric like accuracy. Upon completion, k samples of the performance metric will be available for each algorithm. Different methodologies such as averaging can be used to obtain an aggregate measure from these sample, or these samples can be used in a statistical hypothesis test to show that one algorithm is superior to another.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Determining optimal value of the shape parameter $c$ in RBF for unequal distances topographical points by Cross-Validation algorithm

Several radial basis function based methods contain a free shape parameter which has  a crucial role in the accuracy of the methods. Performance evaluation of this parameter in different  functions with various data has always been a topic of study. In the present paper, we consider studying the methods which determine an optimal value for the shape parameter in interpolations of radial basis  ...

متن کامل

Customer Validation in Cross-Dock

Considering the importance of validation of customers in the cross-dock and since this is one of the problems of implementing cross-dock system in Iran, this study attempted to extract customer validation criteria. The purpose of the research is to eliminate the distrust of distributors in receiving the funds of the sent items and the statistical sample of this research is the experts of the sy...

متن کامل

Cross - Validation with Active Pattern Selection

| We propose a new approach for leave-one-out cross-validation of neural network classiiers called \cross-validation with active pattern selection" (CV/APS). In CV/APS, the contribution of the training patterns to network learning is estimated and this information is used for active selection of CV patterns. On the tested examples, the computational cost of CV can be drastically reduced with on...

متن کامل

Correcting for Optimistic Prediction in Small Data Sets

The C statistic is a commonly reported measure of screening test performance. Optimistic estimation of the C statistic is a frequent problem because of overfitting of statistical models in small data sets, and methods exist to correct for this issue. However, many studies do not use such methods, and those that do correct for optimism use diverse methods, some of which are known to be biased. W...

متن کامل

Cross-study validation for the assessment of prediction algorithms

MOTIVATION Numerous competing algorithms for prediction in high-dimensional settings have been developed in the statistical and machine-learning literature. Learning algorithms and the prediction models they generate are typically evaluated on the basis of cross-validation error estimates in a few exemplary datasets. However, in most applications, the ultimate goal of prediction modeling is to ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008